HADOOP-16635. S3A innerGetFileStatus scans for directories-only still does a HEAD.#1601
Conversation
16d7149 to
1dc8eb6
Compare
1dc8eb6 to
938bc6e
Compare
… does a HEAD. 1. Make sure the probe set is passed down. 2. check for Head probe wraps only HEAD request. Change-Id: I7664a80da53f85c42c319cd6b56b3d6366e09fc6
* move assertion in ITestS3GuardTTL up to test setup * better resilience in ITestAuthoritativePath Change-Id: I88fed4779051de661ff0a7dd33a21d4f5004b096
…arded. This guarantees the unguarded cost is always measured. Change-Id: I380d6babbd6bf944bf9b69a81865755f5b99f8b6
|
Not really an expert on the S3AFs core functionality. That said, the current code does seem quite broken - if HEAD is not included in the probe set. This may need a minor change. I believe the leading key.endsWith("/") is to avoid duplicating a HEAD request - if the key already contains a "/". However, if the set of probes does not contain "Head", and it does contain "DirMarker", but the key ends in a "/" - there won't be any HEAD requests. Is that expected? Other than this, looks good to me. General question, unrelated to the patch, is a single LIST not sufficient to cover both the with and without "/" scenario? |
938bc6e to
80950e0
Compare
|
Sid, thanks for the comments, will review/update the patch Interesting point about the double list. This code path is how its always been, presumably descended from the s3n code. LIST is slower, costs more and much more prone to eventual consistency, which are all good arguments for HEAD first. I actually plan to tune some of the calls which always seem to get used on directory walks (listStatus, listFiles, listLocatedStatus) to do the subtree list first, and only go for the HEAD calls if they don't find any children. This is to reduce the cost of treewalks where the bias is towards populated directories |
… marker probe Address Sid's feedback about handling of keys ending in / by always using the dir marker HEAD for them. Added tests to verify that, and another to verify that the create file operation only does one HEAD when overwrite == true Change-Id: I6d7f39e41738adcbe2ab5609a146310d7e36b557
| } else { | ||
| LOG.debug("Found exact file: normal file"); | ||
| if (!key.isEmpty()) { | ||
| if (probes.contains(StatusProbeEnum.Head) && !key.endsWith("/")) { |
There was a problem hiding this comment.
Not sure if this is ready to review.
With this change - if someone asks for a "HEAD" only, on a URL which happens to have a trailing "/" nothing will be looked up.
IF HAED + DIRMARKER - then yes, at least 1 request will go out.
There was a problem hiding this comment.
yes. That's exactly my thought.
note: none of this API is public, its for avoiding problems on ops where we don't want to look for a file but do for a dir marker. And AFAIK, you can't go from a Path to a / as we strip that off.
How about
- I do a review of all places where we don't ask for the HEAD? So far I think I'm only doing it in create
- I clarify in javadocs
There was a problem hiding this comment.
this is handled in https://issues.apache.org/jira/browse/HADOOP-15430 right?
- and list the issues related to it. Change-Id: Ia6e47fe1f60c7e17e97941d8ddcc03d524e27578
Now. can you create a Path with a trailing / ? I was about to say no, but remembered https://issues.apache.org/jira/browse/HADOOP-15430 .. one of the constructors of Path does let you get away with it, which is something which breaks S3Guard already |
|
🎊 +1 overall
This message was automatically generated. |
bgaborg
left a comment
There was a problem hiding this comment.
+1;
tested against ireland, no errors.
|
thx |
Change-Id: I7664a80da53f85c42c319cd6b56b3d6366e09fc6